Bound Docking by Optimization of 
Electrostatic Interactions in Protein 

Complexes 

A Thesis Submitted in Partial Fidfilme7it of the 
Requirements for the Degree of 

Master of Technology 


by 

Preeti Kumari 
Roll No:- Y3118007 



to the 


DEPARTMENT OF BIOLOGICAL SCIENCES AND 
BIOENGINEERING 

INDIAN INSTITUTE OF TECHNOLOGY KANPUR, INDIA 

MAY, 2005 



^sW«ift .5i W 



Dated; 3 MAY, 2005. 


CERTIFICATE 


This is to certify that the work under the thesis titled “Bound Docking 
by Optimization of Electrostatic Interaction in Protein Complexes” 
by Preeti Kumari (Roll No. Y3118007) has been carried out under our 
supervision and this work has not been submitted elsewhere for a degree. 




'( J) -S- KftTT 


Dr. Bhaskar Dasgupta 

Department of Mechanical Engineer in] 

Indian Institute of Technology 

Kanpur 

India. 


.(5V hr. Balaji Prakash 
^ Department of Biological Sciences 
cind Bioengineering 
Indian Institute of Technology 
Kanpur 
India. 



Dedicated to 


My Beloved Parents 



Acknowledgment 


I would like to take this opportunity to express my deep sense of gratitude to 
iny thesis supervisor and teacher Dr. Bhaskar Dasgupta for allowing me 
t o t,ake up this challenging work under his supervision. I would like to thank 
him lor his laith in me as a student to learn new things and for giving his 
valuable timely advices. Finally, I would like to thank him for his invaluable 
guidance and relentless encouragement. 

I would also like to thank my thesis advisor Dr. Balaji Prakash, for 
encouraging me to take up this challenging work outside my deparment. I 
would like to thank him for his constant faith in me to do something different 
and good lor my thesis. I would also like to thank the head of my department 
Prof. Pradip Sinha for allowing me to choose a thesis topic to be worked 
out in other deparment. I would also like to thank Dr. R. Sankarara- 
makrishnan for his advices and encouragement at times of need. I would 
like to express my gratitudes to Dr. Savitha Govardhan, Dr. Sumit 
Basu and Dr. Nandini Gupta for their kind concern. 

I would especially like to thank my friend dhiraj for enriching me by sharing 
his professional and emotional experiences and encouraging me to meet the 
challenge of learning new things for my thesis. I would also like to convey my 
gratitudes to my friend and labmate hari for giving me his valuable advices 
at times required for my thesis. I would also like to thank my friends patelji, 
ranja, nikks, udit and negi for sharing their love and warmth and providing 
me with a friendly atmosphere and cherishable moments at iitk. 

I would like to thank my parents for enabling me to reach to this stage in 
my life, it is always their blessings and encouragement which gives me the 
courage to face every new challenge of my life with equal fervor and enthu- 
siasm. 

And above all, I thank the Almighty for enabling me to achieve higher goals. 

Preeti Kumaxi 
I . I . T . Kanpur 
3 MAY, 2005. 

iii 



Abstract 


Tlie algorithm presented in the thesis, aims at reconstructing a protein- 
inhibitor (protein) complex from the unbound protein and inhibitor struc- 
tures, l)y searching lor a minimal energy conformation on the basis of elec- 
trostatic interaction. The minimal energy conformation has been obtained 
by the optimization of the electrostatic interaction. For calculating the elec- 
trostatic interaction between the protein and inhibitor at the binding site, 
point to point Coulornbic interaction method has been used, based on the 
continuum dielectric solvation model. The algorithm has been tested on ten 
different protein-inhibitor complexes and from the results obtained it can be 
concluded that, the role played by electrostatic interaction at the binding site 
between the protein-inhibitor complexes is case dependent, it plays a critical 
and important role when both the interacting surfaces have the presence and 
distribution of opposite potential surface patches i.e. the presence of oppo- 
sitely charged side chains at the interface. This algorithm can be used for 
the purpose of secondary screening of the candidate solutions obtained by 
initial screening done on the basis of geometric complementarity to improve 
the ranks of nearly correct solutions, but the degree of success in improving 
the rank will be case dependent i.e. depending upon the presence of different 
potential patches at the interacting surfaces. 


iv 



Contents 


Certificate i 

Acknowledgments iii 

Abstract iv 

List of Figures viii 

List of Protein-Inhibitor Complexes used x 

1 Introduction 1 

1.1 Structure and Function of Proteins 1 

1.1.1 Basic Building Blocks: Amino Acids 1 

1.1.2 Primary, Secondary, Tertiary and Quaternary Structure 2 

1.1.3 Forces Stabilizing the Protein Structure 5 

1.1.4 Mode of Protein-Protein/Ligand Interactions 7 

1.1.5 Summary 8 

1.2 Protein-Ligand Docking 9 

1.2.1 Literature Review 9 

2 Formulation and Algorithm Developed for the Docking Prob- 
lem 13 

2.1 Objective of the Algorithm 13 

2.2 Description of the Algorithm 14 

2.2.1 Collection of Data from PDB and Generating an Array 

of Atom Co-ordinates 14 

2.2.2 Identification of the Binding-site of Protein and Ligand 14 


V 



2.2.3 Translation and Rotation of the Ligand 15 

2.2.4 Calculation and Optimization of the Electrostatic In- 
teraction 15 

2.2.5 Calculation of the Relative Error between Docked and 

Original Complexes 17 

2.3 Summary 18 

3 Results and Discussions 19 

3.1 Results 19 

3.2 Discussion 29 

4 Conclusions 32 

4.1 Summary 32 

4.2 Scope of Future Work 35 

Bibliography 35 

Appendix A 39 

A Input File Format i 

B Amino- acids Codes and Chemical Structure ii 

C Some Important URLs v 


VI 



List of Figures 

1.1 Amino acid structure 1 

1.2 Peptide bond formation in amino acids 2 

1.3 (/) — i/' and uj angles in the peptide chain 3 

1.4 Hydrogen Bonding 6 

3.1 Relative distances (A) in Docked complex Vs Original complex 

;-lLDT 22 

3.2 Relative distances (A) in Docked complex Vs Original complex 

;-lAVW 22 

3.3 Relative distances (A) in Docked complex Vs Original complex 

:-lBRS 23 

3.4 Relative distances (A) in Docked complex Vs Original complex 

:-lBVN 23 

3.5 Relative distances (A) in Docked complex Vs Original complex 

:-lCHO 24 

3.6 Relative distances (A) in Docked complex Vs Original complex 

:-lFSS 24 

3.7 Relative distances (A) in Docked complex Vs Original complex 

:-2PTC 25 

3.8 Relative distances (A) in Docked complex Vs Original complex 

;-lSMF 25 

3.9 Relative distances (A) in Docked complex Vs Original complex 

;-2SEC 26 

3.10 Relative distances (A) in Docked complex Vs Original complex 

:-lTEC 26 

vii 



3.11 Relative dist,ances (A) in Docked complex Vs Original complex 

;-lSMF 27 

3.12 Relative distances (A) in Docked complex Vs Original complex 

:-2SEC 28 

3.13 Relative distances (A) in Docked complex Vs Original complex 

:-lTEC 28 


viii 



LIST OF PROTEIN-INHIBITOR 
COMPLEXES USED 


Bound Complex 
(PDB Codes) 
ILDT 
lAVW 
IBRS 
IBVN 
ICHO 
2PTC 
IFSS 
ISMF 
2SEC 
ITEC 


Unbound Protein/Inhibitor 
(PDB Codes) 
lEPT/lLDT 
lEPT/lAVU 
IBNI/IBTA 
1PIF/2AIT 
5CHA/10V0 
2PTN/4PTI 
2ACE/1FSC 
2PTN/1PI2 
ISCD/ITEC 
1THM/2SEC 


IX 



Chapter 1 
Introduction 


1.1 Structure and Function of Proteins 

1.1.1 Ba.sic Building Blocks: Amino Acids 

Ainino acids arc thi' hn.sic hnildiiig l)l()cks ol proteins. There are 20 different 
aiuiuo acid.s found in all prot eins. All the twenty amino acids have in common 
a central <'arhon atom {( '„) to which are attached a hydrogen atom, an amino 
group (.V/l)) and a carboxyl group (COOH) as shown in the Fig 1.1. One 

(Amino group) (Carboxyl group) 



R 

{Side chain) 

Figure 1.1: Amino acid structure 

amino acid i.s different from the other with respect to the side chain (R) 
att.Hclu'd to the ( ’„ through its fourth valence, as there are 20 different types 
of side chain.s s{>ecifi(Kl by tlie genetic code and thus, 20 different amiiio acids. 
Th<; amino acids aio al)breviated with both a three letter and one letter codes, 


1 



lliev liavf‘ lislfni akmg wiili llwiv fhtniiical structures in the Appendix 
B. i hi' aiiiiiiri arifis are usually dividixl into three diffcn'cnt classes depending 
i»ii lilt' c heiaiea! nature^ ei their side* (‘hains. Tlic first class comprises of those 

with Hiiil-IHtliU' or hydrophobic side chtmis ; ALA, VAL, LEU, ILE, PHE, 
PRO iind Ml', 1. i he .sccdiul class is oC four chmyed residues : ASP, GLU, 
lA S and 1 h(' third cla.ss eunipriscs of those with polar side chains : 

SEIi. rUlL CVS. ASN, GLN, HIS, TYR and TRP. The amino acid GLY 
has only a iiydrogen atom as a side chain and is the simplest amino acid and 
has si)ecial i)r()pcrties which can be used in protein structure determination. 
Duo to the cliiral C,, atom in amino acids(except GLY), they can be present 
in two forms the L-form ;uul the D-forin, all the cunino acids that occur in 
[a'otcius are in L-fonn. Amino acids are joined end to end during protein 
s\-nthesis by bonds which is formed when the carboxyl (COOH) 

group of one amino acid condenses with the amino group (A/G) group of 
the next with the elimination of water as seen in Fig 1.2. 



Peptide bonds 


Figure 1.2: Peptide bond fonnation in amino acids 


1.1.2 Primary, Secondary, Tertiary and Quaternary Struc- 
ture 

Tire primary stn.icture of a segment of a polypeptide chain or of a protein 
is the amino-acid secinence of the polypeptide chain. The Ca atoms of amino 
acids form the main-chain atoms of the protein backbone to which are at- 
tached tlie side chains.Tlie sequence and properties of side chains determine 
all that is unique about a particular protein, including its biological function 
and its specific three-dimensional structure. 

The secondary structure of a segment of polypeptide chain is the local 

2 



si'iaJ.ia! a! 1 ui its liuiiii-fhain atoms without regard to the (‘oiifor- 

iiiatiaii o! Nidr I'hiiiiis or to its rr‘laliouship with other segments. There are 
liiree foiiiiuun seeuiidar)” structures iu proteins, namely alpha helices, beta 
shrft^ uiifl i Ilf as. All ext reruely useful device for studying protein conforma- 
lioii is file 1 hiiiiaeliaiidraii })lot whieli ])}ots (^6 and d) (Fig 1.3). Tlie values of 
I,*) arid i„ * I hat ari.‘ possible nxe const rained geometrically due to steric clashes 
l>et\vi*eii litighhoriiig side cliains. Tlie values of (j) and ?/; can be plotted on 
a two diintuisioiial ma}) ot tlie 0 — v plane which shows allowed and disal- 
lowoi regimis. Ili‘gular secondary structure conformations in segments of a 



Figure 1.3: (j) and a; angles in the peptide chain 

polyp('I)tici(' chain occur wlien all the (j) torsion angles in that polypeptide 
scgnunit arc* equal to t'ach other, and all the 'iji torsion angles are equal. The 
alpha-helix and beta sheet structure conformations for polypeptide chains are 


3 



generally the most, theniiodyuarnically sttible of the regular secondary struc- 
tures. However, particular amino acid sequences of a primary struch.ure in 
a protein may support regular conformations of the polypeptide chain other 
than alphcX-helical or beta-structure. Thus, whereas alpha-helical or IxetcX- 
structure are found most commonly, t.he ac^tual conformation is dependent 
on the particular physical properties generated by the sequence present in 
the polypeptide chain and the solution (surrounding) conditions in which the 
protein is present. In addition, in most proteins there are significant regions 
of disordered structure in which the (j) and 0 angles are not repetitive, these 
are c;alled loop- regions. 

The alpha lielices are found when a stretch of consecutive residues all 
luive the {4>. V’) angle pair approximately -60" and -50". The tv helix has 3.6 
residues per turn with hydrogen bonds between C=0 of residue n and NH 
of residue n-h4- Thus, all NH and CO groups are held with H-bonds except 
the first NH groups and the last CO group, which are at the ends of the tv 
helix, making their ends polar. Some variants of alpha helix are 3io and pi 
helices but, they are present rarely in proteins. The a helices can vary in 
length from four or five to over forty residues. 

The second major structural element, found in proteins are tlie ft sheets. 
These ft sluHd.s in turn are made up of structural units called ft strands, 
which are 5 to 10 residues long, with ((/>, 4’) angles ranging from -120" to 
120". Bet, a sheets are made up from a combinatioir of several regions of the 
polypept ide chain, unlike a helices, which are built from a single continuous 
region. The ft strands are aligned adjacent to each other such that hydrogen 
bonds can form between C=0 groups of one strand and NH groups on an 
adjacent ft strand. The ft sheets formed from several ft strands give a pleated 
appearance due to Ca atoms successively a little above and below the plane 
of the ft sheet. When the ft strands are arranged parallel to each other then 
the sheet is described as parallel while if they are arranged anti parallel the 
sheet is called anti parallel. 

Most proteins are built up of certain combinations of secondary structural 
elements, a helices and ft sheets which are connected by loop regions of vari- 
ous lengths and irregular shapes, such combination of structure is called the 
tertiary structure of the protein. A combination of the secondary struc- 
tural elements in the tertiary structure tend to form a stable hydrophobic 


4 



cure of tlu' molecule. Tlie loop regions are exposed to the solvent and are 
rich in charged and polar side chains, these regions in addition to forming the 
connection bet.ween the secondary structural elements also frequently partic- 
ipate in forming binding sites and enzyme active sites. 

When a protein is made of more than one polypeptide chain, the structure 
formed by the assembly and spatial arrangement of all the polypetide chains 
is called the quaternary structure of the protein, the polypeptide chains 
can be the same or different. Such proteins are called multi- domain proteins. 

1.1.3 Forces Stabilizing the Protein Structure 

Proteins [un form many of the functional rf)les necessary for living systems. 
The function of most proteins is based upon the unique conformation that 
their polypei)tide chain adopts in solution (i.e. the native or folded confor- 
mation). The folded conformation is determined by the primary sequence 
and the interaction of amino acid side chains with the solvent (surrounding 
medium). The pH, ionic composition and concentration, and the solvent di- 
electric, can influence the electrostatic interactions that stabilize the folded 
conformation. Following are some non-covalent interactions stabilizing pro- 
tein structures. 

Hydrogen Bonds ; A hydrogen bond is formed when a liydrogen atom with 
a large positive partial cliarge interacts with an atom with a large negative 
I)aii,ial charge;. The opposite charges attract each other and the hydrogen 
atom which is c;ovalently bound to the hydrogen bond donor atom comes 
very close to the hydrogen bond acceptor atom with its lone pairs. In gen- 
eral, the two partial charges (positive and negative) are part of dipoles, which 
causes the positive hydrogen to be positioned between two electronegative 
atoms as seen in Fig 1.4. In protein molecules polar and charged side chains 
of amino acids take part in hydrogen bonding. Hydrogen bonding plays an 
important role in forming the secondary and tertiary structure of protein 
molecules and in protein-protein association. 

Hydrophobic Interactions ; Hydrophobic interactions are the most 
important non covalent force that cause the linear polypeptide to fold into 
a compact st,ruct\ire. But, it is not the interactions between side chains of 
hydropliobic amino acids (which is mainly van der Waals interaction) that 


5 



6+Hv yH6+ 

;0 H-0 

5+H^62- 6+ 62- 

water 

Figure 1.4: Hydrogen Bonding 

induce the strong interaction, but the increase in entropy gained by the 
removal of hydrophobic surface from the aqueous (polar) environment. The 
aggregation of t.he hydrophobic surfaces (consisting of non-polar residues) 
forms tlio tightly iracked core of a protein, as it is a primary driving force for 
protein Iblding wliich causes the removal of non-polar side chains from polar 
solvent exposure. Some non-polar groups or hydrophobic patches are found 
on the surface of the protein molecules as determinant of preferred sites of 
molecular associations [10]. 

Electrostatic Interactions : Proteins bear many polar and charged side 
chains mostly on their surface, playing an important role in protein-protein 
inten-action. In one study [3] it hiis been found that the electrostatic energy 
of iutera,ct,ion Irtd.ween prot.ein c;omplexes strongly correlates with the rate 
of association. At the protein interface there is optimization of interaction 
between tharged and polar groups, which in turn produces interactions that 
are stabilizing, highly directional and distant-dependent, allowing the signif- 
icant specificity that is characteristic of many recognition processes involving 
biological macromolecules. The electrostatic interaction can be described by 
Coulomb’s interaction method in which the energy is related to the inverse 
of the distance between the interacting charges, this method is used in the 
algorithm presented in the thesis for the calculation of electrostatic energy. 

Van der Waals Interactions : Important contributions to protein sta- 
bility are given by the London dispersion forces (attractive and distant) and 
electron shell repulsion (repulsive and close). The attractive component is 
due to the induction of dipoles in the electron cloud of neighboring atoms, 
coupling of the dipoles, leads to attractive forces. The repulsive component 
is due to the sterical hindrance when neighboring atoms start to have over- 


6 



laj) of the electron clouds. The attractive (distant) and the repulsive (close) 
components are usually taken together and described by the Lennard- Jones 
potential as follows : 

E = E,„[-(/?,„/./?,j)i-^ + 2iRjR,jf] (1.1) 

The second term shows weak attraction at long distance (power of 6 term) 
and the first term shows strong repulsion at very close distance (power of 12 
term). At bonding distances there is an energy well at Rm where the van der 
Waals energy is at a minimum {Em). The repulsion energy is small for the 
closest cont.act dist.ance which is the sum of the so-called van der Waals 
radii for t,he two atoms. 

1.1.4 Mode of Protein-Protein/Ligand Interactions 

Protein-protein association involves the specific ( at times non-specific) com- 
plementary recognition of two macromolecules to form a stable assembly. 
The formation and stabilization of the complex involves various non-covalent 
interactions occurring at the interface namely, electrostatic interaction, hy- 
drogen bonding and presence of hydrophobic patches at the surface [10] are 
some of the imprortant driving forces for the formation and stabilization of 
tht' comirlex. Formation of t.he complex reduces t he net charge on the mole- 
cules, through interaction with other oi)positely charged side chains present 
on tJie surface of the molecules and buries the exposed non-polar side chains 
present at the surface. 

Following are the two hypothesis suggested for the mode of interactions be- 
tween various protein (like enzymes) and its substrate molecule. 

Lock and Key Hypothesis 

An enzyme (a pjrotein molecule) is globular and very large but only a small 
part of it, the active site, is involved in any reaction. When the shape of 
the active site matches with the that of the substrate molecule, the substrate 
molecule fits into the active site and is held there until the reaction is com- 
jrleted. The product is then released and the enzyme is once again ready to 
take i)art in another reaction. This is known as the lock and key hypothesis. 
Tire active site has a distiirct shape, rather like a lock. Just as only the right, 


7 



key will fit a lock, so only the right substrate has the riglit shape to fit into 
the active site. 

Induced-fit Hypothesis 

The lock and key hypothesis, does not explain the interaction coinplet,eIy 
and efficiently as in that case some small molecules like water may enter and 
interfere in the reaction, moreover this hypothesis does not add flexibility to 
the binding site, so, later a refinement of this hypothesis was suggested i.e. 
the induc(xl-fit hypothesis. According to this hypothesis [30], the substrate 
(ligand) dot's not simply bind with the active site. It has to bring about 
changes to the shape of the active site to activate the enzyme (catalytic 
prol.('in) and make the reaction possible. So small molecules may enter the 
active site, but. t hey cannot induce the changes in shape to make the enzyme 
active. The hypothesis suggests that when the enzyme’s active site comes into 
contact with the right substrate, the active site slightly changes or moulds 
itself around the substrate for an effective fit. This shape adjustment triggers 
catalysis (reaction) and helps to explain why certain enzymes only catalyse 
specific reactions. 

1.1.5 Summary 

Prot('ins arc ma.ca’omoleculcs, made up of one or more polypeptide chains, 
which are in f.urn made up of 20 different kinds of amino acids. A typical pro- 
tein cont,ains 200-300 amino acids, hut some can be much smaller which are 
called peptides. Proteins play an important role in the fundamental processes 
of the cell. Their function is determined by their primary sequence which 
in turn determines their structure. The a helix and sheets are the most 
stable and commonly found secondary structural elements found in proteins. 
The stability of protein structure and its interaction with other proteins de- 
pends on many non-covalent interactions such as electrostatic interactions, 
hydrogen bonding, hydrophobic interactions and Van-der Waals interactions. 
The binding site of proteins have considerable flexibility as suggested by the 
induced-fit hypothesis and observed in various experimental studies. 


8 



1.2 Protein-Ligand Docking 


1.2.1 Literature Review 

Definition and Aim of Protein-ligand Docking 

Protein-ligand docking can be defined as For' two given biological molecules 
determine whether they interact and if they interact then determine the ori- 
entation of their m.aximal interaction while minimizing the energy of the com- 
plex [If. Here, the term ligand means either a protein molecule or a chemical 
agent used as in drugs. Aini or goal of protein-ligand docking can be defined 
as To be able to search a database of molecular structures and retrieve all 
molecides that can interact with the query stmcture. 

Docking l)asically is of two types Bound Docking and Unbound Docking [1]. 
Bound docking deals with computational schemes tliat try to regenerate a 
complex from the bound structures of the protein and ligand. Thus, in this 
case the binding site is priorly known, such structures are mostly obtained 
from co-crystallized structures. While, unbound docking deals with those 
computational schemes that try to regenerate a complex from the unbound 
structures of the protein and ligand. So, this is the more difficult part of 
docking as here the a prior knowledge of binding sites is not available [1]. 

Differeuit phases of Docking 

The process of clocking can be divided into three phases namely : 

1. Preprocessing Phase: This phase deals with the mathematical rep- 
resentation of the system or mapping the three-dimensional surface of 
the receptor and ligand. Surface representation is done mostly by its 
geometric representation. Most common method used for this is Con- 
noly surface representation [13]. Connoly surface consists of the part 
of the Van-der Waals surface of the atoms i.e. accessible to a probe 
sphere (contact surface) connected by a network of convex, concave 
and saddle shape surfaces that smooths or rolls over the crevices and 
pits between the atoms. Its a method to describe surface on the basis 
of sparse critical points. A surface normal at each point is generated, 
then the need is to detect a pair of critical points in both molecules that 
share the same internal distance and if superimposed, have opposing 

9 



surface noniials. 


The other coniiuonly used method for surface representation is the grid 
method [2] i.e. r(;presenting 3D surfac(i of tlu' protean on to a fiiu; 
grid and assigning different scores for the points falling on the surface, 
in space and penalty for those inter penetrating. But, this penalty 
shoidd be decided carefully as it should be neither too high nor too 
low, so that certain flexilihility is allowed at the interface. 

2. Recognition Phase: Its the most important and critical phase of 
docking which involves recovering candidate ligands from the database 
generated in the pre-processing phase, matching the receptor/protein’s 
surfac(> pat.ches and rank the candidates on the basis of the scores 
obtained [1]. 

3. Post-processing Phase: This pliase deals with filtering out the best 
candidates out of the top ranking candidates obtained in the recogni- 
tion phase. For this, electrostatic interactions, solvation energy and 
other kinds of interactions occurring at the interface can be taken into 
consideration to be used as a criteria for filtering the minimal energy 
state candidates out of the best [1]. 

Scoring Functions 

Scoring Functions are used t,o score the candidates and rank them on the 
btisis of scores obtained. Thus, scoring helps in detecting correct solutions 
with low ranks and those having minimum rmsd deviations from the crystal 
complex [1]. Based on the scoring functions used by an algorithm docking 
can be further divided into two types i.e. Geometric Docking and Integrated 
Docking algorithms [2] . Former kind of docking takes into consideration only 
the shape complementarity, ignoring any other kind of interactions such as 
electrostatic interactions occurring at the surface. While, the latter kind of 
docking also use some of the energy functions such as electrostatic interac- 
tions [12], solvation energy, H-bonding etc [5]. occurring at the. interface in 
to consideration for scoring the solutions. Most often used scoring functions 
are namely; 


10 



Geometric / Shape Complementarity 


This scoring hmction scores the complementarity of molecular shapes at the 
binding interfac’c of tht' prot.ein and the ligand. It is based on geometric 
features of the surlacc' ol the interacting molecules, rewarding surface contact, 
penalizing overlaps, arid injecting serious overlaps. 

Energy Functions 

Energy functions are used t.o evaluate liow good a conformation is, as these 
functions geiunate a value for energy based on the conformation of the mole- 
lade. They i)rovide information on what c;onformations of the molecule are 
bet ter or worsi' as lower tlu' energy value, then the better will be the confor- 
mation. The actual ('iiergy value produced by the function does not provide 
any useful information lyy itself, it’s the comparison to another value that 
helps in analysing which conformation is better. One may conclude, that the 
biisic property of these functions is minimization of energy of the docking 
complex. The terms used in energy functions for docking problem include all 
kinds of non-covalent interactions such as Coulombic interaction, hydrogen 
l)onding, Van-der Waals interaction and hydrophobicity. The energy func- 
tions ar(^ mostly used as secondary energetic filters in docking for improving 
tlui ranks of neaidy correcd, solutions olrtained after an initial global search 
l)as('(l on shape complementarity. 

Scope and Limitations of Protein-Ligand Docking 

Molecular Docking algorithms hold various promises for the future [1]. It 
can be used in proposing potential drugs in effect enhancing drug discovery 
and reducing the work of molecular biologists to large extent. One of its 
most important benefit is going to be able to perform structure based drug 
design i.e. proposing drugs based on the specific structure of the molecule, 
to which it best interacts/binds. In other words one should be able to search 
a database for interacting proteins with the query. 

But, as with every technology follows the limitations, so is for molecular 
docking. Its most important limitation is incorporation of all kinds of flexibil- 
ity [26] at the binding site and also the incorporation of water molecules [2] 
at the interface which play rather an important role during protein-protein 


11 



or protein-inhihitor interactious, infact. souietiines water molecules form iui- 
l)ortiUit. direct contacts at the interfaces. These two are t.he two important 
limit.ations of the existing docking algorithms [2C]. But, possibly in future 
tlu'se limitatfons would l)e overcome by the clovelopment of more powerful 
compul.ors and algorithms. 


12 



Chapter 2 


Formulation and Algorithm 
Developed for the Docking 
Problem 

2.1 Objective of the Algorithm 

Tlie objective of the algorithm presented in the thesis is to reconstruct a 
bound complex from unbound structures by the optimization of electrostatic 
interaction at the interface of the docking site. The algorithm developed 
here deals with protein-protein docking. The protein complexes used here 
are i)rotein-inhihitor complexes. The electrostatic interaction has been cal- 
culatrul on the l)ixsis of point to point Coulombic interaction. Electrostatic 
int ('raction is one' of tlice important kind of interactions occurring at protein- 
protein intf'rfaces due to the presence of charged and polar side chains at the 
l)inding surfaces. It Ims been suggested on the basis of several studies [3] that 
electrostatic interaction especially play a key role in conferring specificity to 
the binding site and stabilization of the complex along with other kind of 
interactions in many cases of protein-protein association. So, the algorithm 
presented here, also aims at exploring the advantages and disadvantages of 
docking only on the btrsis of electrostatic interaction. For this purpose, the 
algorithm has been tested on 10 different protein-protein complexes and a 
complete analysis of the results on the basis of amino acid composition at 
the interface has been done. Based on the importance of electrostatic inter- 
action, the algorithm also aims at developing a secondary screening energy 
filter, which can be used for improving the ranks of candidate solutions ob- 
tained after initial screening done on the basis of geometric complementarity. 


13 



2.2 Description of the Algorithm 

[lie e()un)l(>te algorithm can be described in the following five parts : 


2.2.1 Collection of Data from PDB and Generating an 
Array of Atom Co-ordinates 

The PDB (Protein-Data bank) files contains the co-ordinates of the individ- 
ual atoms in the protein molecvile and text which describes the source of the 
{irotein, the crystallization conditions, crystal structure and refinement de- 
tails. For th(' purpose of this algorithm only the co-ordinates of the atoms are 
recinired ol lioth the complex and unbound structures. So, the array function 
ext racts tlu'sc' co-ordinates from the PDB tiles. Two kinds of array hmction 
are didiiu'd, one t.hat extracts the co-ordinates of all the protein atoms called 
a.s the Coviarniy function, while the other one that exclusively extracts the 
c'o-ordinates of the atoms that are present in the interacting residues (amino 
acids) present at the binding interface and has been named as arrays func- 
tion. The atom co-ordinates are present in the ATOM field of PDB file, such 
t hat the following columns correspond to the x, y and z co-ordinates : 
column 31 to 38 = x-coordinate 
column 39 to 4(5 = y-coordinate 
column 47 to 54 = '/-coordinate 

'riic'se co-ordinat.(rs are reciuirtHl for performing the translation and rotation 
of the inhibitor (i)rotein as ligand) with respect to the receptor (protein). 

2.2.2 Identification of the Binding-site of Protein and 
Ligand 

The term binding site is used for the surface of interaction between the pro- 
tein and its inhibitor (ligand) molecule, it comprises of certain number of 
residues (five to twenty) present at the interacting surface of both protein 
and the inhibitor molecule. The identification of the binding-site between 
the protein and inhibitor is one of the most important and critical steps for 
the algorithm. This has been done by using a recently developed software 
called CASTp which identifies pockets and cavities on the protein surface 
analytically, which was also nx:onfirm(id from literature study. Once the in- 
formation of residues present at the binding site is obtained, the co-ordinates 

14 



of till t lu' Htoins pri'scnt in these re'sidnes are extracted in the Ibriii of arra}', 
by the huiction (h'velojx'd in the first st(']) i.c. arrays Junction. The informa- 
tion ol liinding site is required for the calculation of electrostatic interaction 
between the atoms ol the protein and inhiliitor participating at the bind- 
ing site. For the partial charges on l.he at.oins of the amino acids standard 
C'HARMM22 charges [24] lia.ve been used. The arrays junction contains the 
partial diarges in acldition to the co-ordinates at the binding site. 


2.2.3 Translation and Rotation of the Ligand 

The ligand moleeule is moved towards the protein molecule by means of 
translation ol a ])oint (atom) (/*]) on the ligand to a point (atom) (/),) on 
the proti'in i.('. t.he reeept.or as follows : 

r = l|P„-Pi|| (2.1) 


r = translation distance between the two atoms. 

Now, in order to give the initial translation distance (r), two interacting 
atoms (presimt at the binding site) are chosen i.e. one in the each receptor 
and inhildtor from the already defined binding site (known from previous 
step). Th(‘ distanci' between these two atoms is the distance by which the 
ligand has to he translated with resiiect to the fixed receptor. The next step 
is to hud th(' orientation of the ligand with respect to the receptor which 
forms a minimal imergy complex. In order to obtain the minimal energy 
ori<'ntati(m of ligand, an oi)tiinization routine is called. 

2.2.4 Calculation and Optimization of the Electrosta- 
tic Interaction 

Electrostatic interaction in the algorithm has been calculated on the basis of 
point to point Coulornbic inter-actions. The equation of coulombs law for the 
force of attraction between two point charges is as follows; 

F = kQMr^ (2.2) 

F = force experienced between the charged particles. 

Q\, Q'z = charges of the interacting particles, 
r = distance between the two cha.rges. 


15 



liere, k is the proportioiuility constant given by k = l/47re. 
where, c is the dielec:tric constant of the inediuni. 

A function called Enrgjun has been developed for the calculation of electro- 
static energy of interaction across the residues at the binding site between the 
protein and the inhibitor. The electrostatic interaction energy is calculated 
on the btxsis of point to point Couloinbic interactioir method as follows; 

AE = - llDo.at)[QiQ2lr) (2.3) 

Din = dielectric constant of protein i.e. 2. 

D„ut = dielectric constant of outside medium (aciueous) i.e. 80. 

Qi , Q 2 = charge on tlie two interacting atoms in Coulombs, 
r = dist.anc:e l)etwoien the two interacting atoms in Angstrom. 

AE = Net electrostatic interaction energy between the two atoms, it is cal- 
culated in Joules. 

In order to calculate the electrostatic interaction energy between the two pro- 
teins at the binding site, first the interaction energy between the interacting 
residues at the level of individual atoms is calculated and is then summed up 
to calculate the total energy of the complex. While, in order to prevent too 
much interpenetration of ligand into the receptor protein a penalty is applied 
when the distance between the two atoms becomes less than l.SAas follows : 

margin = \ \Po ~ Pi\\'^ — (2-4) 

d= l.SA 

P„ = a point (atom) on receptor molecule 
Pi = a point (atom) on inhibitor molecule 
if margin < 0 
then, 

AE = AE — c{margin) (2-5) 

where, c = 2 

The value of penalty was chosen after testing the algorithm on various dock- 
ing systems. 

An optimization function of MATLAB called fmincon have been used 
to optimize the electrostatic energy calculated from the energy function to 
obtain the minimum energy complex by rotating the ligand with respect to 
the receptor in all possible directions within the supplied constraints. The 

16 



out,i)iit of optimization routine gives the niiuimuui energy of the com[)lex 
and most importantly the orientation of the ligand i.c. (^JJ and 7 in which 
it forms the minimal energy complex with the receptor. Now, the complete 
ligand molecule is in tlie conformation of minimum energy with respect to 
the receptor. 

2.2.5 Calculation of the Relative Error between Docked 
and Original Complexes 

Now, in order to validate the results, the modeled protein complex struc- 
tures obtained by docking are compared to the original structure of protein- 
complexes. This is done by calculating distance between the atoms of the 
docked protein and ligand (docked complex) and similarly, the distance be- 
tween the atoms of original protein-ligand complex is calculated. Now, a 
relative error is calculated from the difference between the distance between 
the atoms of docked and original structures with respect to the distance 
between the atoms of the original complex as follows; 

D = Do- Dn (2.6) 

N = \\D\\ (2.7) 

Erel = N/\\D,\\ (2.8) 

Erd = Relative error of the new docked complex with respect to the original 
complex. 

Dn = Distance between the atoms in the original complex 
Do — Distance between the atoms in the new docked complex 
D = Differencci between the distances Ixitween atoms of original complex and 
that of the docked complex. 

In order to view the docked structures the coordinates of the inhibitor in the 
original complex PDB file is replaced by the new co-ordinates of the inhibitor 
as obtained from the minimum energy conformation after optimization. This 
new PDB file can be viewed by any of the molecular viewing softwares such 
as RASMOL, SWISSPDBviewer and CHIME. 


17 



2.3 Summary 


The aim of the algorithm developed here is to reconstimct a protein-iahibitor 
complex, from the miboimd structures of the protein and the inhibitor by 
optimization of electrostatic interactions calculated by point to point interac- 
tion Coulombic method according to the ecpiation 2.3. The receptor (protein) 
molecule is kept untouched, while the ligand (protein inhibitor) is translated 
and rotated, and energy optimization is performed by using a matlab func- 
tion fmincon. The docked complex is compared with the original complex 
by calculating a relative error value for the docked complex with respect to 
the original complex by using the equation 2.4. 


18 



Chapter 3 

Results and Discussions 


3.1 Results 

Tlie results have been summarized in the tables 3.1 and 3.2, these tables 
list the Iriological systems used to test and verify the algorithm developed 
here, the systems have beeir listed along with the PDB codes of the bound 
structure and unbound structures and their references. The Erei values i.e the 
relative error of the docked (reconstructed) complex structure with respect 
to the original complex structure calculated by equation 2.4 have also been 
listed in the table. The table 3.1 lists the Erd. values for docked complexes 
reconstructed from unbound structures, while table 3.2 lists the Erei values 
lor docked complexes reconstructed from disassembled structures. 

The figures 3.1 to 3.13 plot the relative distances between atoms in the 
original and the docked complexes. The docked complexes in case of figures 
3.1 to 3.10 have been obtained from unbound structures, while in case of 
figures 3.11 to 3.13 docked complexes have been obtained from disassembled 
structures. It can be observed from the figures that some distances between 
atoms in the docked complexes are comparatively much longer than that in 
the original complexes, it is due to the possible reasons that these regions 
have similar or very small potential patches and so, some other kind of inter- 
actions or contacts are occurring at these surface patches, which the present 
algorithm is not able to detect i.e take in to account such as hydropho- 
bic contacts, Van-der Waals interaction and short range Hydrogen-bonding. 
While at certain regions the distance between the atoms is shorter in docked 
complexes as compared to that in the original complexes, it is due to certain 
degree of interpenetration which the algorithm has not restricted. In cases 


19 



wlier(' tile original ancl doc'kc'd (‘om})lexes show much variation in tiie plots, 
as expect('(i siicii c'ases have higiier corresponding value of Breh 


20 



System 

PDB codes (com- 
plex;nnbonnd) 

Trel 

Referenctcs 

trypsiii/leech derived trypsin inliil)it()r 

ILDT; lept/lldt 

0.1894 

Stnbbs et 
cil. 1997 

try pain/soy-beau inhibitor 

lAVW; lept/lavu 

0.2858 

Song and 
Suh 1998 

bariiase/l)arstar 

IBRS; Ibni/lbta 

0.3744 

Buckle et 
al. 1994 

hydrolase/hydrolase inhibitor 

IBVN; lpif/2ait 

0.5337 

Weigand et 
al. 1995 

c,v-chymotrypsin/HPTI 

ICHO; 5cha/lovo 

0.7608 

Fuiinaga et 
al. 1987 

acetylcholinesterase / fasciculin-II 

IFSS; 2ace/lfsc 

1.2547 

Harel et al . 
1995 

/3-trypsin/BPTl 

2PTC; 2ptn/4pti 

2.6305 

Marquart 
et al. 1983 

trypsin/Bowinan-Birk inhibitor 

ISMF; 2ptn/lpi2 

3.4769 

Huang et 
al. 1994 

snbtilisin /egiin-C 

2SEC; Iscd/ltec 

7.6844 

McPhalen 
and James 
1988 

thermitase / eglin 

ITEC; lthm/2sec 

7.6844 

Gros et al. 
1989 


Table 3.1: List of the complexes reconstructed from unbound structures used to 
test and verify the algorithm developed with their respective relative 
error values. 


System 

PDB codes (com- 
plex;unbound) 

Erel 

References 

siibtilisin/eglin-C 

2SEC; Iscd/ltec 

0.2829 

McPhalen 
and James 
1988 

trypsin/ Bowman-Birk inhibitor 

ISMF; 2ptn/lpi2 

0.4723 

Huang et 
al. 1994 

thermitase/eglin 

ITEC; lthm/2sec 

0.8024 

Gros et al. 
1989 


Table 3.2: List of the complexes reconstructed from disassembled structures used 
to test and verify the algorithm developed with their respective rela- 
tive error values. 

The following figures plot the distances between atoms of interacting residues 
in the original complex and the docked complex for comparison. Here, docked 
complex has been obtained from, unbound structures. 

wife*. *...152.0.6.7..-.:, 






Distance between the atoms of interacting residues ^ Distance between the atoms of interacting residues 


Relative distances in Docked complex Vs Original complex:- ILDT.pdb 


- Original complex 

- Docked complex 



20 30 40 50 60 70 

Atom number of interacting residues 


80 90 


gure 3.1: Relative distances (A) in Docked complex Vs Original complex 
ILDT 


Relative distances in Docked complex Vs Original compiex:-1 AVW.pdb 



60 80 
Atom number of interacting residues 


Figure 3.2: Relative distances (A) in Docked complex Vs Original complex 




14 


Relative distances in Docked complex Vs Original complex:-1BRS.pdb 


~B- Original complex 
Docked complex 



qI I I 1 I I I I I 

0 20 40 60 80 100 120 140 160 

Atom number of interacting residues 

Figure 3.3: Relative distances (A) in Docked complex Vs Original complex : 
IBRS 


Relative distances in Docked complex Vs Original complexi-IBVN.pdb 



Figure 3.4: Relative distances (A) in Docked complex Vs Original complex 


IBVN 


23 




Relative distances in Docked complex Vs Original complex:-1CHO.pdb 


^ 1 1 

-©- Original complex 
Docked complex 




5 1 1 I \ I I I \ 1 

0 10 20 30 40 50 60 70 80 

Atom number of interacting residues 

Figure 3.5: Relative distances (A) in Docked complex Vs Original complex : 
ICHO 


Relative distances in Docked complex Vs Original complexi-IFSS.pdb 



Figure 3.6: Relative distances (A) in Docked complex Vs Original complex 


IFSS 


24 




Relative distances in Docked complex Vs Original comp[ex;-2PTC.pdb 



Figure 3.7: Relative distances (A) in Docked complex Vs Original complex : 
2PTC 


Relative distances in Docked complex Vs Original complex:-1SMF.pdb 



Figure 3.8: Relative distances (A) in Docked complex Vs Original complex 


ISMF 


25 




140 


Relative distances in Docked complex Vs Original complex:~2SEC.pclb 




Atom number of interacting residues 

Figure 3.9: Relative distances (A) in Docked complex Vs Original complex : 
2SEC 


Relative distances in Docked complex Vs Original complex:~1TEC.pdb 



Figure 3.10: Relative distances (A) in Docked complex Vs Original complex 


ITEC 


26 




The following figures plot the distances between atoms of rntcracting residues 
in the original cow,plex and the docked complex for comparison. Here, docked 
complex has been obtained from dissassembled structures. 

Relative distances in Docked complex Vs Original connplex;-1SMF.pdb 



Atom number of interacting residues 


Figure 3.11: Relative distances (A) in Docked complex Vs Original complex 
ISMF 


27 




Distance between the atoms of interacting residues 



Atom number of interacting residues 


Figure 3.13: Relative distances (A) in Docked complex Vs Original complex 
ITEC 


28 




3.2 Discussion 


The LDT complex [15] has a small ligand molecule which is 46 residues long, 
such small ligand molecules mostly form certain specific interactions at the 
binding surface, as they tend to conserve certain specific polar and charged 
residues at the surface. Here, both the protein and ligand molecule have 
many charged side chains like LYS, ARG and polar side chains like GLN 
and ASN at the binding surface, the presence of such polar and charged 
residues at the surface causes electrostatic interactions to play a key role in 
the complex formation. Hence, in the case of protein molecule ILDT due to 
small interacting surface between the protein and the ligand, the interaction 
strongly depends upon the presence of these charged residues. The ligand 
molecule in the complex lAVW [17], though is not of small size as is the 
case with ILDT complex, but it shows presence of some charged residues 
( such as ASP and GLU) at the binding surface, presence of such charged 
residues at the surface clearly demonstrates that these residues are important 
for the interaction at the binding surface. The protein molecule also shows 
the presence of polar residues like GLN and ASN at the surface. Due to the 
presence of both charged and polar side chains at the binding site in ILDT 
and lAVW, electrostatic interaction helps in reconstruction of the complex 
with relatively low error of 18 and 28 percent, respectively. 

In the complex IBRS, both the protein and ligand molecule show the 
presence of charged residues LYS, ASP, GLU and ARG and some polar side 
chains (such as THR, TRP, TYR) at the binding surface [21], it indicates 
that these residues play a key role in binding. But, certain degree of hy- 
drophobicity is also present at the interface due to the presence of non-polar 
residues (such as ILE, PHE, LEU, ALA, VAL) which, are equally abundant 
at the binding pocket. In the complex IBVN [23] there is a large surface of 
interaction between the protein and ligand. Most of this surface is composed 
of non-polar residues (such as VAL, ALA, LEU and PRO) forming a strong 
hydrophobic patch at the binding site, the next most abundant residues at 
the surface here, are polar residues such as THR, HIS, TYR, SER, GLN and 
ASN, which are responsible for Hydrogen bonding at the surface of protein 
molecules. Thus, the presence of large interacting surface (IBVN) covered 
by a certain degree with a hydrophobic patch and followed by almost equal 


29 



ahiuulance of stjme polar and charged residues (IBRS) foruiing small charged 
potential patches at t'he interacting surface, it indicates that the hydrophobic 
patch is taking part in the stabilization of the complex apart from electrosta- 
tic interac'tion which is responsible for conferring specificity to the binding 
site. Hence, the two molecules IBRS and IBVN could be docked on the 
basis of electrostatic interaction with a relative error of 37 and 50 percent, 
respectively. 

In the protein complex ICHO [14], 2PTC [18] and IFSS [16], the binding 
site shows the presence of hydrophobic cluster occupying 40 percent (ICHO) 
and 64 percent (2PTC and IFSS) of the interacting surface [10]. This cluster 
consists of residues ILE, LEU, ALA, VAL, PHE, PRO, HIS, SER and TYR. 
The presence of such strong hydrophobic patch on the interacting surface of 
these complexes indicates the association of hydrophobicity as a determinant 
of proffered binding site between the protein and the ligand [10]. In case 
of protein molecule ICHO very few charged residues (two) and some polar 
residues (four) are present at the surface and the hydrophobic cluster occupies 
40 percent of the binding surface, so here the complex could be reconstructed 
by a high relative error of 76 percent. While, in case of protein molecules 
2PTC and IFSS, the hydrophobic patch occupies almost 64 percent of the 
binding surface, but, there is presence of no charged residues at the surface, 
thus, in these cases the complex could not be reconstructed successfully on 
the basis of electrostatic interaction. 

This protein-inhibitor complexes ISMF [20], 2SEC [19] and ITEC [22] are 
special docking cases used here for testing the algorithm. In case of com- 
plex iSMF, the sequence of the unbound ligand is different from that of the 
bound ligand present in the complex and sequence of the inhibitor has also 
been highly truncated i.e. its of size 17 residues only in the complex, while 
its original size is 46 residues. While, in complexes ITEC and 2SEC, the 
unbound structure of the inhibitor molecule is not available, so the structure 
of the inhibitor in complex with some other protein has been used, but here, 
though both the inhibitor molecules belong to the same class, but their se- 
quences vary with very little sequence identity. Thus, the algorithm could 
not reconstruct the complex successfully in these above mentioned cases from 
the unbound structures, so rather a different approach of docking is used for 
such cases termed as docking from disassembled structures (here, structure 


30 



()1 tile ligand is obtained from the original complex itself). This approach of 
docking from disassembled structures [2] has been used in these three special 
cases of docking i.e. ISMF, 2SEC and ITEC and the complex has been suc- 
cessfully reconstructed with a relative error of 47 percent, 28 percent and 68 
percent, respectively. In case of comjilex iSMF, the binding surface contains 
small charged potential patches, the surface has almost an equal distribution 
of polar and non-polar residues and presence of few charged residues. While, 
in case of 2SEC the binding pocket contains both charged and polar residues 
at the surface, so the complex could be reconstructed with comparatively low 
relative error of 28 percent. But, in case of iTEC 78 percent of the binding 
surface [10] is covered by non-polar residues producing a strong hydrophobic 
cluster at the binding site, thus, the reconstruction of this complex on the 
basis of electrostatic interaction produces a high relative error. 


31 



Chapter 4 
Conclusions 

4.1 Summary 

j4s observed Jr-om the results an,d discussions mentioned above, it can be con- 
cluded and summarized as follows : 

In protein complexes where at the binding interface of the protein and the 
ligaird are present many oppositely charged side chains and some polar side 
chains, such that these charges play a dominant or equivalent role (along 
with some other kinds of interactions) for interaction between the protein 
and ligand, it is possible to reconstruct the complex from the unbound struc- 
tures usiirg this algorithm with a relative error ranging from eighteen to fifty 
percent depending upon the role played by electrostatic interaction for the 
complex formation. This is observed in the initial four cases in complexes 
ILDT, lAVW, IBRS and IBVN and in the complexes reconstructed from dis- 
assembled structures i.e. 2SEC and ISMF. In the case of protein complexes 
where the ligand molecule is very small in size, due to small interacting sur- 
face the interaction strongly depends on the presence of the charged side 
chains at the interface, this is observed in the complex ILDT. 

In those protein complexes where the binding interface has mainly non- 
polar and some polar side chains, and very few or no charged side chains, 
such that electrostatic interaction plays a negligible or minor role in the 
complex formation, in such cases the relative error of reconstruction of the 
complex on the basis of electrostatic interaction is comparatively much high, 
due to the absence of strong charged potential patches. This is observed 
in case of protein complexes ICHO, 2PTC and IFSS where, at least 40 
percent of the binding surface is covered by hydrophobic cluster of strong 


32 



novi-polar rt'siclues [10]. Thus, in such cases its not possible to dock the 
unbound structures on tire basis of point to point Coulonibic electrostatic: 
calculations as in such cases, the presence of similar partial charges at the 
surface contributes to a repulsive electrostatic interaction. 

In the case of protein complexes ISMF, 2SEC and ITEC wliere the un- 
bound structure of the inhibitor molecule is not available due to the specific- 
reasons mentioned in the section of discussions above, in such cases, the com- 
plex has been reconstructed from the disassembled structures of the protein 
and the ligand. This method helps in reconstruction of the complex success- 
fully in complexes iSMF and 2SEC with comparatively less relative error, 
as the binding site in these complexes has some oppositely charged potential 
patches. While, in case of ITEC, the value of relative error is comparatively 
higher due to the presence of a strong hydrophobic patch at the binding sur- 
face which covers 78 percent of the binding surface [10] and there is absence 
charged residues at the binding surface. 

While, performing the literature survey, working for the algorithm and 
testing it on various cases and based on the results and observations, apart 
from the above mentioned conclusions, i would also like to infer that, how- 
ever, most charged groups of proteins are on the surface of the protein where 
they do not strongly interact with other charged groups from other proteins 
due to the high dielectric constant of the water solvent, but are stabilized by 
hydrogen bonding and polar interactions to the water and other similar side 
chains present on the surface of other proteins. Electrostatic interactions in 
water are less strong than within the protein itself (though there are few 
charged residues in protein interior) because of water’s high dielectric con- 
stant (which results from the tendency of the large dipoles of water molecules 
to align with any electric field). Electrostatic interactions can be attractive 
or repulsive varying from case to case depending upon the distribution of 
charged residues on the surface. Electrostatic interaction are responsible for 
conferring .specificity at the binding .site and their role in the formation and 
stabilization of the complex varies. One more thing which has been observed 
is that, some PDB files have data missing ( as reported in case of unbound 
structures of lAVW and IBRS) for certain number of residues or atoms 
present (involved in the binding site) in the loop regions, which are disor- 
dered, but these regions are known to frequently participate in binding site 


33 



and form enzyme (protein) active sites, as they are rich in charged and polar 
side chains, so some information at times is missing from the PDB file, which 
could have been important for the energy calculations, though this does not 
affect the results drastically (unless there are cases where information about 
a complete loop participating at the binding site is missing). 

The algorithm developed here, can be used in a better way on the ba- 
sis of observations and conclusions obtained from the above mentioned ten 
cases of protein-inhibitor complexes. In order to use it in cases where the 
binding site information is not available and to find the approximate site of 
binding then in such cases, the solutions having energy of very low value of 
the order of 10“^^ or even less and having a very low relative error should be 
considered and the structures obtained should be properly analyzed on the 
basis of available experimental information regarding the probable binding 
site (which is sometimes available in literature). Before docking one should 
also check the sequences of the bound and unbound ligand molecules for any 
differcmces. This algorithm can be used for the purpose of secondary screen- 
ing of the solutions obtained as possible candidates for docking on the basis 
of initial screening by geometric docking, in order to filter out the best out 
of the possible solutions, this kind of approach has been implemented by 
recent geometric-electrostatic algorithms and has been found to be success- 
ful in cases where the interacting surface has large different potential patches 
rather than similar kind of patches or a homogeneous surface. Similarly here, 
the presence of oppositely charged side chains at the interface is important 
for a successful docking based on point to point Coulombic interactions. 


34 



4.2 Scope of Future Work 


This algorithm uses point to point Coulombic interaction method to calculate 
the electrostatic interactions, where the dielectric constants for the protein 
is taken to be 2 and for the outside media (considering it to be aqueous) its 
considered 80, which is based on continuum dielectric solvation model. One 
can use distance dependent dielectric model and use poisson-boltzmann’s 
equation to calculate the electrostatic potential, but that will be computa- 
tionally more rigorous and expensive, in that context calculation based on the 
presently used method are fast. The other short range energy potentials like 
H-bonding, Van-der Waals interaction and Hydrophobic interactions may be 
included in order to calculate the complete energy of tlie complex. 


35 



Bibliography 


[1] Inbal Halperin, Buyong Ma, HaimWolfson, and Ruth Nussinov (2002). 
Principles of Docking : An Overview of Search Algorithms and a Guide 
to Scoring Functions. Proteins : Structure, Function, and Genetics, 
409443. 

[2] Talexavder Heifetz, Ephraim Katchalski-Katzir, and Miriam Eisenstein 
(2002). Electrostatics in protein-protein docking. Protein Science, 11, 
571587. 

[3] Felix B Sheinerman, Raquel Norel and Barry Honig (2000). Electrostatic 
aspects of protein-protein interactions. Current Opinion in Structural 
Biology, Jd, 153159. 

[4] Kay-Eberhard Gottschalk, Hani Neuvirth aird Gideon Schreiber (2004). 
A novel method for scoring of docked protein complexes using predicted 
protein-protein binding sites Protein Engineering, Design and Selection, 
vol. 17 no. 2, 183-189. 

[5] Fernandez-Recio, J., Totrov, M., and Abagyan, R. (2002). Screened 
charge electrostatic model in protein-protein docking simulations. Pac 
Symp Biocomput., 552-63. 

[6] A. Heifetz and M. Eisenstein (2003). Effect of local shape modifications 
of molecular surfaces on rigid-body proteinprotein docking. Protein En- 
gineering, Vol. 16, No. 3, 179-185. 

[7] Sharp, K. and Honig, B. (1990). Electrostatic Interactions in Macro- 
molecules : Theory and Applications. Ann. Rev. Biophys. Biophys. 
Chem 19, 301-332. 

[8] Jeffrey G. Mandell, Victoria A. Roberts, Michael E. Pique, Vladimir 
Kotlovyi, Julie G. Mitchell, Erik Nelson, Igor Tsigelnyl and Lynn F. 

36 



Ten Eyckl (2001). Protein docking using contiinium electrostatics and 
g('ornctric fit. Protein Engineering, Vol. I 4 , No. 2, 105-113. 

[9] Raquel Norel, Felix Sheinerinan, Donald Petrey and Barry Honig 
(2001). Electrostatic contributions to pi'oteinprotein interactions : Fast 
energetic filters for docking and their physical basis. Protein Science , 
10, 2147-2161. 

[10] L. Young, R.L. Jerinigan, and D.G. Coveil (1994). A role for surface 
hydrophobicity in protein-protein recognition. Protein Science , 3, 717- 
729. 

[11] David M. Lorber, Maria K. Udo and Brian K. Shoichet (2002). Pro- 
teinprotein docking with multiple residue conformations and residue 
substitutions. Protein Science, 11, 1393-1408. 

[12] Henry A. Gabb, Richard M. Jackson and MichaelJ. E. Sternberg (1997). 
Modelling Protein Docking using Shape Complementarity, Electrostat- 
ics and Biochemical Information. J. Mol. Biol. 272, 106-120. 

[13] Connolly ML. Solvent-accessible surfaces of proteins and nucleic acids. 
Science, 221, 709713. 

[14] Fujinaga M, Sielecki AR, Read RJ, Ardelt W, Laskowski M Jr, James 
MN (1987). Crystal and molecular structures of the complex of alpha- 
cliymotrypsin with its inhibitor turkey ovomucoid third domain at 1.8 
A resolution. J Mol Biol. 195 (2), 397-418. 

[15] Milton T. Stubbs, Robert Morenweiser, Jrg Strzebecher, Margit 
Bauer, Wolfram Bode, Robert Huber, Gerd P. Piechottka, Gabriele 
Matschiner, Christian P. Sommerhoff, Hans Fritz and Ennes A. Auer- 
swald (1997). The Three-dimensional Structure of Recombinant Leech- 
derived Tryptase Inhibitor in Complex with Trypsin. J. of Bio. Chem. 
Vol. 272, No. 32, 19931-19937. 

[16] Harel M, Kleywegt GJ, Ravelli RB, Silman I, Sussman JL. (1995). Crys- 
tal structure of an acetylcholinesterase-fasciculiir complex ; interaction 
of a three-fingered toxin from snake venom with its target. Structure. 
Vol.3, No. 12, 1355-66. 


37 



[17] Song HK, Suh SW (1998). Kiuiitz-type soybean trypsin inhibitor revis- 
ited : rehnod struettire of its complex with porcino trypsin reveals an 
insight into tlie interaction between a homologous inhibitor from Eryth- 
rina caffra and tissue-type plasminogen activator. J Mol Biol. Vol. 215, 
No. 2, 347-63. 

[18] M. Marquart, J. Walter, J. Deisenhofer, W. Bode and R. Huber (1983). 
The geometry of the reactive site and of the peptide groups in trypsin, 
trypsinogen and its complexes with inhibitors, dcfa Cryst. B39, 480-490. 

[19] McPhalen CA, James MN (1988). Structural comparison of two serine 
proteinase-protein inhibitor complexes : egiin-c-subtilisin Carlsberg and 
CI-2-subtilisin Novo. R'ioc/remislry. Vol. 27, No. 17, 6582-98. 

[20] Y. Li, Q. Huang, S. Zhang, S. Liu, C. Chi and Y. Tang (1994). Studies 
on an artificial trypsin inhibitor peptide derived from the mung bean 
trypsin inhibitor : chemical synthesis, refolding, and crystallographic 
analysis of its complex with trypsin. Journal of Biochemistry, Vol 116, 
Issue 1, 18-25. 

[21] Buckle AM, Schreiber G, Fersht AR. (1994). Protein-protein recogni- 
tion : crystal structural analysis of a barnase-barstar complex at 2.0-A 
resolution. Biochemistry. Vol. 33, No. 30, 8878-89. 

[22] Gros P, Fujinaga M, Dijkstra BW, Kalk KH, Hoi WG. (1989). Crys- 
tallographic refinement by incorporation of molecular dynamics : ther- 
mostable serine protease thermitase complexed with eglin c. Acta Crys- 
tallogr B. Vol. 45, Pt. 5, 488-99. 

[23] Wiegand, G. Epp, O. Huber, R. (1995). The crystal structure of porcine 
pancreatic alpha-amylase in complex with the microbial inhibitor Ten- 
damistat. J. Mol. Biol. 247, 99-110. 

[24] Delphi Documentation. (2000) Delphi module. CHARMM22 charges. 

[25] Insight 11 Documentaion. (2000) Docking module. 

[26] CAPRI ; Critical Assessment of PRediction of Interactions. Documen- 
tation and Targets (http ;//capri. ebi. ac. uk/) 


38 



[27] Braiideii. C. and Tooze, J. hit'rvduction to Protein, Structure. (2ud edi- 
tion). Garland Publishing, New York. 

[28] Creighton, T. E. Proteins. (2nd edition). W. H. Freeman and Co. , New 
York. 

[29] Voet, D. and Voet, J. G. Biochemistry. {2nd edition). John Wiley and 
Sons, New York 

[30] Lehninger Principles of Biochemistry. Nelson L. D. and Cox M. M. (3rd 
edition). Macmillan Worth Publishers, UK. 


39 



Appendix A 
Input File Format 


filenamel=’*.pdb’; enzyme file 

filenanie2=’*.pdb’; inhibitor file 

filename3=’*.pdb’; complex file 

modelfilename=’*.pdb’; new pdb filename for the model 

Resvecl=[ ]; interacting residues of enzyme 

ChainIDl=’*’; chaintype of enzyme residue 

Resvec2=[ ]; interacting residues of inhibitor 

ChainID2=’*’; chaintype of inhibitor 

Atmnuml=[ ]; atom no. of the receptor for translation 

Atmnum2=[ ]; atom no. of the ligand for translation 

OrgResvecl = [ ]; input(’give residue nos. of original rec:’) 

OrgResvec2=[ ] ; input(’give residue nos. of original lig;’) 

OrgChainIDl=’*’; input(’give chainid of original rec:’) 

OrgChainID2=’*’; input(’give chainid of original lig:’) 


1 



Appendix B 


Amino-acids Codes and 
Chemical Structure 


Name 

Symbol 

R group 


3 Lett. 

Lett. 

Aspartate 

Asp 

D 

"0.. 

'C-CH.7- 

H 

■i-coo- 

NH3 

Glutamate 

Glu 

E 

C-CH2-CH2- 

<i-coo’ 

NH3 

Lysine 

Lys 

K 

H s N ~C H 2 -C 

H 

1 

hC-COO" 

^H3 

Arginine 

Arg 

R 

lii.'M-C-ilH-CHs-CHj-CHa- 

NH. 

H 

i-COO" 

NH3 

Histidine 

Ks 

H 

HC ==:C-CH2- 

Hii Ik 

H 

H 

■i^oo- 

NH3 

Tyrosine 

Tyr 

Y 

HO -4 

H 

i--coo- 

NH3 

-i- 

Tryptophan 

Tip 

w 

r:^ "^n C-CHy 

-•Ih 

H 

H 

-6-000- 

i!rH3 


11 





Phenylalanine 

Phe 

F 

V**- 

H 

•6-COO" 

NH3 

Cysteine 

Cys 

C 

HS-CH2- 

H 

■d-coo- 

iIhs 

4- 

Methionine 

Met 

M 

CH 3 -S~CH 2 -CH 2 “ 

H 

1 

C-COO‘ 

NH3 

Serine 

Ser 

S 

HO-CH^- 

H 

- 6 ^ 00 " 

NH3 

+ 

Threonine 

Thr 

T 

CH3-CH- 

dH 

H 

-i-coo" 

^H 3 

Asparagine 

Asn 

N 

P-CH2- 

H 

d-coo- 

iIhs 

4- 

Glutamine 

Gin 

Q 

NHs 

^-CHa-CHr 

0'-' 

H 

d-coo- 

iIhs 

4. 


iii 






Glycine 

Gly 

G 

H- 

H 

■6~coo- 

:^3 

Alanine 

Ala 

A 

CH3- 

H 

-i-coo- 

NH3 

Valine 

Val 

V 

I 

H 

-c-coo- 

Leucine 

Leu 

L 

CH3 

^CH-CHs- 

CH3 

H 

■(i-coo- 

1>JH3 

4 - 

Isoleucinc 

He 

I 

CH3 

H 

-6-COO- 

^3 

Pro line 

Pro 

P 

CHa^ 

CHa 

wiz;-o-w 

1 

0 

0 

0 




Appendix C 

Some Important URLs 


1) PDB (Protein Databank) 

<http : //www . rcsb . org/pdb/> 

2) Molecular Viewing Softwares (RASMOL, CHIME, SWISSPDBviewer) : 

<http ; / /www . rcsb . org/pdb/help-graphics .btml#rasmol_download> 

3) CAPRI (Critical Assessment of PRediction of Interactions) : 

<h.ttp : / /capri . ebi . ac . uk/> 

4) CASTp : <http : / / cast . engr . uic . edu/cast/> 

5) DELPHI documentation; 

<http : / /cast . engr . uic . edu/ cast/> 

6) INSIGHTII documentation: 

<www. chem.uh . edu/Courses/Lyncb/Chem6397/Tutorials/InsightII/ insight2 . h.tml> 


V 



